Data import and preparation

This analysis makes use of the verts dataset saved on line 186 in Final_Code_Feb15.R.

## read in data
verts <- as_tibble(
  readRDS("../data-clean/lpi_and_wild_trait_data.RDS")) %>% 
  relocate(lpi, .before = Binomial) %>% 
  relocate(Group, TrophicLevel, .after = Binomial)

The lpi column indicates whether the species is represented within the C-LPI timeseries dataset. Note, however, that the species that do occur in C-LPI are duplicated in this dataset (i.e., they occur once with lpi = "C-LPI" and once with lpi = "C-Vertebrates").

table(verts$lpi)
## 
##         C-LPI C-Vertebrates 
##           845          1690
# n_distinct(verts$Binomial) ## [1] 1689

Let’s split the dataset in two, recreating vertslpi and vertswild which are used to generate this combined dataset in the data preparation script (see above).

## C-LPI species
vertslpi <- verts %>% 
  filter(lpi == "C-LPI") %>% 
  arrange(desc(Group), Binomial) %>% 
  select(-lpi)
# unique(vertslpi$Binomial) ## [1] 845

## all Canadian vertebrates (including C-LPI species)
vertswild <- verts %>% 
  filter(lpi == "C-Vertebrates") %>% 
  arrange(desc(Group), Binomial) %>% 
  select(-lpi)
# unique(vertswild$Binomial) ## [1] 1689 -- one less that nrows??

Next, we need to find the difference between these datasets. That is, find the Canadian species that are not represented in the C-LPI. We will then use the two lists create a different version of the dataset that includes a single row per species, with a column indicating whether they occur in the C-LPI subset (clpi = binary: yes, no).

## vector of species in C-Vertebrates but not C-LPI
wild_only <- setdiff(vertswild$Binomial, vertslpi$Binomial)

## check if there are any species in C-LPI not in C-Verts 
## (this shouldn't be possible)
# setdiff(vertslpi$Binomial,vertswild$Binomial) ## character(0) 

## create a new version of vert2, with only a single row per species
## (rather than duplicating species found in both lpi and wild), and 
## adding a column indicating if the spcies is found in C-LPI
verts2 <- vertswild %>% 
  mutate(clpi = ifelse(Binomial %in% unique(vertslpi$Binomial), "yes", "no"), 
         clpi = factor(clpi, levels = c("yes", "no")))

Plotting

First, let’s plot the distribution of two traits, body size and lifespan (sort of reproducing Fig. 2 from the manuscript). We can do this in two ways:

  1. Comparing the trait distributions of C-LPI to those of all Canadian vertebrates (i.e. including C-LPI)
## Warning: Removed 280 rows containing non-finite values (stat_density).
## Warning: Removed 964 rows containing non-finite values (stat_density).

  1. Comparing the trait distributions of C-LPI to those of Canadian vertebrates not represented in the C-LPI (i.e. excluding C-LPI)
## Warning: Removed 245 rows containing non-finite values (stat_density).
## Warning: Removed 771 rows containing non-finite values (stat_density).

Finally, let’s plot the comparisons of traits for species in and not in C-LPI again, this time facetting by taxonomic group

## Warning: Removed 245 rows containing non-finite values (stat_density).

## Warning: Removed 771 rows containing non-finite values (stat_density).

Analysis

Approach

The analysis below applies the idea of estimation statistics to generate confidence intervals on the differences in the distributions of traits in the C-LPI and non-C-LPI subsets. This is done using bootstrapping (i.e. repeated resampling with replacement to generate samples of the same size as the reference), which is a non-parametric approach and thus directly incorporates the shape of the distribution. Once we have resampled the subsets of data many times, we can then estimate the mean (or median in this case) difference between groups – that is, the effect size – with 95% confidence intervals based on the bootstrapped samples. This is possible because even though the underlying distributions are highly non-Normal, we can be confident that sufficient resampling of any distribution will result in a Normal distribution around the mean of that value.

Here I am using the dabestr package (see vignette) to run the bootstrapping (n = 5,000 samples) and produce Gardner-Altman estimation plots of the results. The plots show the raw data (species-specific trait measurements), the mean/median for each group, and the mean/mean difference between groups and corresponding confidence interval.

Note: I’ve used a similar approach in the past and have found it to be a really intuitive way of bringing some statistical insight to the data. This, combined with a detailed description of the nuanced ways the distributions differ, will I think be quite convincing!

Again, there are two approaches we could take. I think both are interesting and could be incorporated into the results:

Reminder: these analyses are comparing C-LPI and non-C-LPI subsets

  1. Overall: compare the distributions, irrespective of taxonomic group

    (a). Comparison of body size distributions

    (b). Comparison of lifespan distributions

  2. Taxon-specific: compare the distributions separately for each taxonomic group

    (a). Comparison of body size distributions

    (b). Comparison of lifespan distributions

Summary of Results

bs_est.md ## overall comparison - body size
## dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
## =============================================================
## 
## Good evening!
## The current time is 19:20 pm on Thursday February 17, 2022.
## 
## Dataset    :  .
## X Variable :  clpi
## Y Variable :  BodySize.log
## 
## Unpaired median difference of no (n = 635) minus yes (n = 810)
##  -0.723 [95CI  -0.889; -0.58]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
ls_est.md ## overall comparison - lifespan
## dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
## =============================================================
## 
## Good evening!
## The current time is 19:20 pm on Thursday February 17, 2022.
## 
## Dataset    :  .
## X Variable :  clpi
## Y Variable :  LifeSpan.log
## 
## Unpaired median difference of no (n = 267) minus yes (n = 652)
##  -0.465 [95CI  -0.648; -0.302]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
bs_est2.md ## taxon-specific comparison - body size
## dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
## =============================================================
## 
## Good evening!
## The current time is 19:20 pm on Thursday February 17, 2022.
## 
## Dataset    :  .
## X Variable :  group_clpi
## Y Variable :  BodySize.log
## 
## Unpaired median difference of Birds_n (n = 58) minus Birds_y (n = 340)
##  1.53 [95CI  1.05; 2.08]
## 
## Unpaired median difference of Fishes_n (n = 456) minus Fishes_y (n = 336)
##  -0.774 [95CI  -0.934; -0.6]
## 
## Unpaired median difference of Herps_n (n = 33) minus Herps_y (n = 40)
##  1.57 [95CI  -0.0233; 2.95]
## 
## Unpaired median difference of Mammals_n (n = 88) minus Mammals_y (n = 94)
##  -4.48 [95CI  -6.45; -2.56]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
ls_est2.md ## taxon-specific comparison - lifespan
## dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
## =============================================================
## 
## Good evening!
## The current time is 19:20 pm on Thursday February 17, 2022.
## 
## Dataset    :  .
## X Variable :  group_clpi
## Y Variable :  LifeSpan.log
## 
## Unpaired median difference of Birds_n (n = 44) minus Birds_y (n = 317)
##  0.596 [95CI  0.363; 0.939]
## 
## Unpaired median difference of Fishes_n (n = 118) minus Fishes_y (n = 204)
##  -0.811 [95CI  -1.06; -0.445]
## 
## Unpaired median difference of Herps_n (n = 32) minus Herps_y (n = 42)
##  -0.146 [95CI  -0.635; 0.33]
## 
## Unpaired median difference of Mammals_n (n = 73) minus Mammals_y (n = 89)
##  -0.693 [95CI  -1.23; -0.357]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.

Notes

Other resources

Other plots

Breakdown of trophic levels in C-LPI and C-vert (only) subsets

Taxon-specific trophic levels in C-LPI vs. non-CLPI

Taxon-specific representation for the different traits (i.e., what proportion of Canadian birds in C-LPI have body size, lifespan, and trophic level data available?)

Figures for paper